Part-of-speech Tagging in French Te Experiments in Tagset
نویسنده
چکیده
Part-of-speech tagging is needed for French Text-to-Speech (TTS) synthesis to disambiguate the pronunciation of homograph heterophones, liaison instances, and eventually to model intonational contours. A core problem in the part-of-speech tagging in French TTS is to decide on the tagset used for the tagger and the tagset needed by TTS. We carried out a number of experiments on several sizes of tagsets as well as on several algorithms to investigate this problem. Our experiment results suggest that there may be an optimal tagset to be used for the part-of-speech disambiguation in French TTS. This optimal tagset contains a slightly larger number of tags than the tagset that is needed by TTS for pronunciation disambiguation and intonational modeling purposes. In our experiments, the optimal tagset gives a 98.4% tagging accuracy for TTS, when a trigram Hidden Markov Model tagger is used.
منابع مشابه
Internal and external tagsets in part-of-speech tagging
We present an approach to statistical partof-speech tagging that uses two di erent tagsets, one for its internal and one for its external representation. The internal tagset is used in the underlying Markov model, while the external tagset constitutes the output of the tagger. The internal tagset can be modi ed and optimized to increase tagging accuracy (with respect to the external tagset). We...
متن کاملTraining and Evaluation of POS Taggers on the French MULTITAG Corpus
The explicit introduction of morphosyntactic information into statistical machine translation approaches is receiving an important focus of attention. The current freely available Part of Speech (POS) taggers for the French language are based on a limited tagset which does not account for some flectional particularities. Moreover, there is a lack of a unified framework of training and evaluatio...
متن کاملLinguistic Issues in Grace (Evaluation of Part-of-Speech Tagging for French)
GRACE is the first large-scale evaluation program of taggers for French. This experiment allowed to compare the assignments of Parts-of-Speech tags by various different taggers, on a common corpus of literary and journalistic texts. The evaluation relied on the acceptance by the participants of a reference formalism for morpho-syntactic description (the reference tagset) used by an expert to ta...
متن کاملSemantic Role Labelling with minimal resources: Experiments with French
This paper describes a series of French semantic role labelling experiments which show that a small set of manually annotated training data is superior to a much larger set containing semantic role labels which have been projected from a source language via word alignment. Using universal part-of-speech tags and dependencies makes little difference over the original fine-grained tagset and depe...
متن کاملSTTS 2.0? Improving the Tagset for the Part-of-Speech-Tagging of German Spoken Data
Part-of-speech tagging (POS-tagging) of spoken data requires different means of annotation than POS-tagging of written and edited texts. In order to capture the features of German spoken language, a distinct tagset is needed to respond to the kinds of elements which only occur in speech. In order to create such a coherent tagset the most prominent phenomena of spoken language need to be analyze...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002